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■ Cluster Basics 

■ Wish Availability 

■ Mana geabilit y 

■ Scalability 

■ A pplication and Service Support 



Cluster Basics 



*■ top of page 



What Is a server "cluster"? 

A server cluster is a group of Independent servers managed as a single 
system for higher availability, manageability, and scalability. 

What does it take to create a server cluster? 

The minimum requirements for a server cluster are (a) two servers 
connected by a network, (b) a method for each server bo access the other's 
disk data, and (c) special duster software like Microsoft Cluster Server 
(MSCS). The special software provides services such as failure detection, 
recovery, and the ability to manage the servers as a single system, 

What are the features of server clustering? 

There are three primary features to server clustering: availability, 
manageability, and scalability. Using Microsoft Cluster Server as an example: 

■ Aval/ability: MSCS can automatically detect the failure of an application or 
server, and quickly restart it on a surviving server. Users only experience 
a momentary pause in service. 

■ Manageability : MSCS lets administrators quickly inspect the status of all 
cluster resources, and move workload around onto different servers within 
the duster. This is useful for manual load balancing, and to perform 
"rolling updates" on the servers without taking important data and 
applications offline. 

■ Scalability: "Cluster-aware" applications can use the MSCS services 
through the MSCS Application Programming interface (API) to do dynamic 
load balancing and scale across multiple servers within a cluster. 

What are clusters used for? 

customer surveys indicate that M5C5 clusters are used as highly available 
multipurpose platforms, mirroring the current uses of the Microsoft Windows 
NT Server operating system. Surveyed customers suggested that the most 
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common uses of MSCS clusters are mission-critical database management, 
file/intranet data sharing, messaging, and general business applications. 

When a duster is recovering from a server failure, how does the 
surviving server get access to the failed server's disk data? 

There are basically three techniques that dusters use to make disk data 
available to more than one server: 

■ Shared disks: The earliest server clusters permitted every server to 
access every disk. This originally required expensive cabling and switches, 
plus specialized software and applications. (The specialized software that 
mediates access to shared disks is generally called a Distributed Lock 
Manager, or DLM.) Today, standards like SCSI have eliminated the 
requirement for expensive cabling and switches. However, shared-disk 
clustering still requires specially modified applications. This means it is not 
broadly useful for the wide variety of applications deployed on the miUions 
of servers sold each year. Shared-disk clustering also has inherent limits 
on scalability since DLM contention grows geometrically as you add 
servers to the cluster. Examples of shared-disk clustering solutions 
include Digital VAX Clusters and Oracle Parallel Server. 

m Mirrored disks : A flexible alternative is to let each server have its own 
disks, and to run software that "mirrors" every write from one server to a 
copy of the data on at least one other server. This is a great technique for 
keeping data at a disaster recovery site in synch with a primary server, A 
large number of disk mirroring solutions are available today; examples for 
the windows NT Server environment are available from Network 
Specialists (NSI), Octopus, Veritas, and Vinca. Many of these mirroring 
vendors also offer cluster-like high-availability extensions that can switch 
workload over to a different server using a mirrored copy of data, 
However, mirrored -disk failover solutions cannot deliver the scalability 
benefits of clusters. It is also arguable that they can never deliver as high 
a level of availability and manageability as shared-disk clustering since 
there is always a finite amount of time during the mirroring operation in 
which the data at both servers is not 100 percent identical. 

■ "Shared nothing 1 *: In response to the limitations of shared-disk clustering, 
modern cluster solutions employ a "shared nothing" architecture in which 
each server owns its own disk resources (that is, they share "nothing" at 
any point in time). In the event of a server failure, a shared-nothing 
cluster has software that can transfer ownership of a disk from one server 
to another This provides the same high level of availability as shared-disk 
clusters, and potentially higher scalability since it does not have the 
Inherent bottleneck of a DLM. Best of all, It works with standard 
applications since there's no special disk access requirements. Examples 
of shared-nothing clustering solutions include Tandem NonStop, Informix 
Online/XPS, and Microsoft Cluster Server. 

Intro to Microsoft Cluster Server * top of page 



What is "wotfpack"? 

"Wolfpack" was the code name for Microsoft Cluster server. 
What is Microsoft Cluster Server (MSCS)? 

MSCS is a built-in feature of Windows NT Server, Enterprise Edition. It Is 
software that supports the connection of two servers Into a "duster" for 
higher availability and easier manageability of data and applications. MSCS 
can automatically detect and recover from server or application failures. Xt 
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can be used to move server workload to balance utilization and to provide for 
planned maintenance without downtime. And, over time, MSCS will also 
become a platform for highly scalable, cluster-aware applications. 

How many servers can be in ait MSCS cluster? 

The Initial release of MSCS will be supported on clusters with two servers. A 
future version referred to as MSCS "Phase 2" will support larger clusters, and 
will include enhanced services to simplify the creation of highly scalable, 
cluster-aware applications. 

What other companies were involved in the development of MSCS? 

Microsoft worked closely with leading hardware vendors, software vendors, 
and customers in the specification and development of MSCS and its API. 
These other companies participated through five different programs: 

■ Strategic affiances: Microsoft formed strategic alliances with two of the 
key pioneers in clustering technology: Digital Equipment Corporation (in 
1995) and Tandem Computers (in 1996). In both of these alliances, 
patent portfolios were cross-licensed, and Microsoft gained access to 
proven clustering expertise and technology, plus a strong partner 
oommitted to helping extend that technology to benefit customers of 
Windows NT Server. 

■ Early Adopter vendors: Starting with the announcement of the MSCS 
project in. October 1995 and extending through the beta test program, 
Microsoft worked closely with six leading system vendors who provided 
support, expertise, and sample cluster configurations to support the 
development of MSCS. The Early Adopter system vendors were Compaq 
Computer corporation, Digital Equipment, Hewlett- Packard, IBM, NCR, 
and Tandem Computers* 

■ Open Process: Whenever Microsoft extends the Microsoft Win32® API, as 
It did with MSCS, it enlists the participation of vendors and customers in 
its "Open Process." This is a series of confidential design previews and 
specification reviews that assures that the resulting API Is robust, 
complete, and usable by a broad segment of the industry. More than 60 
organizations participated In the MSCS Open Process sessions, which took 
place between January and July of 1996. 

■ SDK previews'. Microsoft first provided early copies of the MSCS Software 
Development Kit to the 60+ Open Process organizations in September of 
199$, and distributed a more advanced preview SDK to more than 2,000 
developers at the November 1996 Microsoft Professional Developers 
Conference- „ _ 

■ Beta test program: MSCS Beta 1 was shipped In December 1996 to 350 
customer and vendor sites. Beta 2 shipped in April 1997 to more than 750 
sites. And Beta 3 of MSCS was shipped as an embedded feature of 
Windows NT Server, Enterprise Edition 4.0 Beta 2 In July 1997 to more 
than 2,100 sites. Each of these betas was also available to thousands of 
additional developers and customers through the Microsoft Developers 
Network (MSDN) Level III. 

In what languages will MSCS be available? 

Microsoft Windows NT Server, Enterprise Edition 4.0, which included MSCS 
1.0, will be offered In English, French, German, Japanese, and Spanish. 

Through what channels will Windows NT Server, Enterprise Edition 
be available? 
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Microsoft Windows NT Server, Enterprise Edition will be available to 
customers through all standard channels: reseller, retail, OEM, and the 
Microsoft Select licensing program. 

What versions of Windows NT Server will MSCS support? 

MSCS software will only be available as a built-in feature of Windows NT 
Server, Enterprise Edition. 

Will MSCS be extended beyond Windows NT Server to Windows NT 
Workstation? 

There is currently no plan to extend cluster support to Windows NT 
Workstation. MSCS software has been designed and written to closely 
integrate with the architecture and features of Windows NT Server, including 
Its server-oriented networking and directory services capabilities. 

What clients can connect to an MSCS cluster? 

Any client that can connect to Windows NT Server through TCP/IP will work 
with MSCS. This includes Microsoft MS-DOS, Microsoft Windows 3.x, Windows 
95, Windows NT, Apple Macintosh, and UNIX. MSCS does not require any 
special software on the client for transparent recovery of services that 
connect to clients through standard IP protocols. 



High Availability ^ top of page 



How does MSCS provide high availability? 

MSCS uses software "heartbeats 0 to detect failed applications or servers. In 
the event of a server failure, It employs a "shared nothing" clustering 
architecture that automatically transfers ownership of resources (such as disk 
drives and IP addresses) from a failed server to a surviving server. It then 
restarts the failed server's workload on the surviving server. All of this—from 
detection to restart— typically takes under a minute. If an individual 
application fails (but the server does not), MSCS will typically try to restart 
the application on the same server; if that fails, it moves the application's 
resources and restarts it on the other server. The cluster administrator can 
use a graphical console to set various recovery policies, such as 
dependencies between app!ications i whether or not to restart an application 
on the same server, and whether or not to automatically 
"failback" (rebalance) workloads when a failed server comes back online. 

Can MSCS provide "2ero downtime"? 

No, MSCS can dramatically reduce planned and unplanned downtime. 
However, even with MSCS, a server could still experience downtime from the 
following events; 

■ AfSC5 faiiovzr t/me: If MSCS recovers from a server or application failure, 
or If it is used to move applications from one server to another, the 
application(s) will be unavailable for a non-zero period of time (typically 
under a minute.) 



m Failures which MSCS can't recover: There are types of failure that MSCS 
does not protect against, such as loss of a disk not protected by RAID, 
loss of power when a UPS isn't used, or loss of a site when there's no 
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fast-recovery disaster recovery plan, but most of these can be survived 
with minimal downtime If precautions are taken in advance. 
■ Server maintenance that requires downtime: MSCS can keep applications 
and data online through many types of server maintenance, but not all 
(for example: completely upgrading both servers in a cluster, or Installing 
a new version of an application which has a new on-disk data format that 
requires reformatting preexisting data). 

Microsoft recommends that dusters be used as one element in customers' 
overall programs to provide high integrity and high availability for their 
mission-critical server-based data and applications. 

Is MSCS fa i lover transparent to users? 

MSCS does not require any special software on client computers, so the user 
experience during faitover depends on the nature of the dient side of their 
client-server application. Client re connect ion is often transparent, because 
MSCS has restarted the applications, file shares, and so on, at exactly the 
same IP address. 

If a client is using "state-less" connections such as a standard browser 
connection, then it would be unaware of a failover if it occurred between 
server requests. If a failure occurs while a dient is connected to the failed 
resource, then the client will receive whatever standard notification Is 
provided by the client side of the application in use when the server side 
becomes unavailable. This might be, for example, the standard "Abort, Retry, 
or Cancel?" prompt you get when using Windows Explorer to download a file 
at the time a server or network goes down. In this case, client reconnection 
is not automatic (the user must choose "Retry"), but the user is fully 
informed of what's happening and has a simple, well-understood method of 
reestablishing contact with the server. Of course, in the meantime, MSCS is 
busily restarting the service or application so that, when the user chooses 
" Retry, B it reappears as if it never went away. 

For client-side applications that have "state^fuir connections to the server, a 
new logon is typically required following a server failure. In many cases, this 
approach is required for security purposes. For example, this is how SAP R/3 
works—if the server connection is lost, the user is prompted to log on again 
to make sure it* s the same user accessing the application. 

Even with state-full connections, it's possible for an application to 
automatically reconnect following a failover. For example, when Microsoft 
demonstrated SAP R/3 failover at Microsoft Scalability Day, it was accessed 
through an Active browser application that had automatically (and securely) 
cached the user's ID and password from the initial logon. Thus, when the 
server connection was momentarily lost during the failover demo, the client 
application automatically logged on again using the cached ID and password. 
This was done using standard IP connections, running a simple Microsoft 
Visual Basic development system program within an HTML document through 
the Microsoft ActiveX technology. 

When a server comes back online following a failure, is there any 
human Intervention required to get ft back "up and running," or Is 
the heartbeat enough for the other server to Include It once again? 

No manual intervention Is required. When a server running Microsoft Cluster 
Server, say "Server A, M boots, it starts the MSCS service automatically. MSCS 
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in turn checks the interconnect (and network if necessary) to find the other 
server in its duster, say "Server B." If Server A finds Server B, then Server A 
rejoins the cluster and Server B updates it with current cluster status info. 
Server A then initiates "failbadc," moving back failed -over workload from 
Server B to Server A at an appropriate time. 

What Is "fallback," and how does it work In MSCS? 

"Fallback" is the ability to automatically rebalance the workload in a duster 
when a failed server comes back online. This Is a standard feature of MSCS. 
For example, say "Server A" has crashed and its workload failed-over to 
"Server B." When Server A reboots, It automatically finds Server B and 
rejoins the cluster. It then checks to see if any of the cluster groups running 
on Server B would "prefer" to be running on Server A. If so, it automatically 
moves those groups from Server B to Server A as soon as the time is right. 
Fallback properties— that is, which groups can fallback, which Is thefr 
preferred server, and during what hours the time Is "right" for fallback—are 
all set from the cluster administration console. 

Can the servers in an MSCS cluster be located at separate locations 
for recovery from site disasters? 

Not at this time. All of the duster configurations currently being considered 
for validation use SCSI connections to storage resources, which limits the 
distance between clustered servers to the distance supported by standard 
SCSI. This is typically no more than 25 meters, though there are SCSI 
extender technologies that can potentially stretch the connection up to 1,000 
meters. 

Note that Windows NT Server customers already have several choices for 
software that can mirror data to remote disaster recovery sites, including 
solutions from N.S.L, Octopus, Veritas, and Vinca. Most of these vendors 
have already announced that their disaster site mirroring solutions will also 
work with MSCS dusters. 

Can MSCS restore registry keys for an application from one server to 
the other when doing fallover? 

Yes. Recovery of an application's registry information is a configurable 
feature that is available to the Generic Application and Generic Service 
resource types. Basically, you tell It what registry keys to log and recover, 
and that's all there Is to it. This capability should be used if the application or 
service stores volatile information in specific registry keys. If this is done, 
when the resource comes online on another node, it will have the same 
registry Information as the previously online resource. 

when an application restarts on another server following a failure, 
does it re-start from a copy of the application? 

No. The new server (say, "Server 2") would start the application from the 
same physical disks as Server 1, since ownership of the application's disks on 
the shared SCSI bus had been moved from Server 1 to Server 2 as one of 
the first steps In the fallover process. This approach assures that the 
application always restarts from its last known state, as recorded on its disk 
drives (and, if you use the available option, as recorded in its registry keys.) 

Can MSCS restore an application's "state" at the time of its failure 
rather than requiring a complete restart? 

MSCS can restore the state of an application's registry keys, but any other 
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state information must be managed and restored by the application. 
Applications need to provide some model for persistence to insure that state 

TM 

can be recaptured. For example, Microsoft SQL Server uses transaction 
logs to provide this assurance. If a server running Microsoft SQL Server 
crashes, upon restart the application uses its transaction logs to bring the 
database back to a known state. With a duster, just as with a single server, 
good application design and the use of ACID (Atomic, Consistent, Isolated, 
and Durable) transaction properties are important. 

What Is the granularity of resource fa i lover? 

MSGS supports failover of "virtual servers," which usually correspond to 
applications, Web sites, print queues, or file shares (including their disk 
spindles, files, IP addresses, and so on). MSCS also provides cluster-wide 
services that are simultaneously available on all servers in the cluster, 
including duster administration, performance monitoring, event viewing, a 
cluster name, and duster time synchronization. 

What is a "quorum disk" and how does it help MSGS provide high 
availability? 

Ifs a disk spindle that MSCS uses to determine whether or not another 
server is up or down. Technically, it's a resource that can only be owned by 
one server at a time / and for which servers can negotiate for ownership. 
Negotiating for the quorum drive allows MSCS to avoid "split brain" situations 
where both servers are active and think the other server is down. (This can 
happen when, for example, the cluster interconnect is lost and network 
response time Is problematic.) The use of a quorum resource is one of the 
sophisticated algorithms that Microsoft got by working with pioneers In 
clustering such as Digital and Tandem. 
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How does MSCS improve the manageability of servers? 

MSCS gives administrators a graphical console from which they can monitor 
and manage all of the resources in a duster as if it was a single system. 
Using the familiar standards of a Microsoft Windows graphical user interface, 
an administrator can use the cluster console to: 

■ audit the status of all servers and applications in the cluster. 

m set up new applications, file shares, print queues, and so on, for high 
availability. 

■ administer the recovery policies for applications and resources. 

■ take applications offline, bring them back online, and move them from 
one server to another. 

The ability to graphically move workload from one server to another with 
only a momentary pause In service (typically less than a minute) means 
administrators can easily unload servers for planned maintenance without 
taking important data and applications offline for long periods of time. 

Does MSCS provide administrators with a "single system Image* 1 ? 
Yes, MSCS provides administrators a single graphical console to manage all 
of the applications and resources in a duster. The MSCS console presents 
duster resources by physical server, and by "virtual server" (or "cluster 
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group"). This allows administrators to centrally manage the cluster as a 
collection of virtual application -oriented servers, or as a collection of physical 
resources when appropriate. 

Can MSCS be remotely managed? 

Yes. An authorized user can run the MSCS administration console from any 
Windows NT Workstation or Windows NT Server on the network. 

How does MSCS help administrators do "rolling upgrades" of their 
servers? 

With MSCS, server administrators no longer have to do all their maintenance 
within those rare windows of opportunity when no users are online. Instead, 
they can simply wait until a convenient off-peak time when one of the 
servers in the cluster has enough horsepower for all of the cluster workload. 
They then point-and-dick to move all the workload onto one server, and 
they're ready to perform maintenance on the unloaded server. Once the 
maintenance is complete and tested, they bring that server back online and it 
automatically rejoins the cluster, ready for work. When convenient, the 
administrator repeats the process to perform maintenance on the other 
server in the cluster. This ability to keep applications and data online while 
performing server maintenance is often referred to as doing "rolling 
upgrades" to your servers. 

Will Microsoft support "rolling upgrades' 1 of future server products 
using MSCS clusters? 

It is Microsoft's goal to support "rolling upgrades" between releases of 
Microsoft server software using MSCS clusters. However, we cannot commit 
to this for all releases of all products. Persistent storage formats must 
occasionally change to accommodate new capabilities, and changes in 
persistent storage occasionally require applications to be taken offline while 
storage or Indices are restructured. Microsoft will commit to always providing 
smooth upgrades between releases of all our products, and we'll use MSCS to 
provide seamless rolling upgrades whenever possible. 



Scalability ^ top of page 



How will MSCS enhance server scalability? 

The manageability benefits of the initial version of MSCS will simplify many of 
the processes currently used to improve scalability, such as upgrading server 
hardware and Installing new versions of applications. 

There are scalability advantages to clustering, for example: 

■ For file sharing and print services, some of these shares/queues can be 
provided by one node and some by the second node. IIS services can be 
provided by both nodes simultaneously. 

h For branch office automation, the total workload can be partitioned so 
that Exchange Enterprise Edition 5.5, for example, runs on one node, but 
print, file and web services are provided by the second node. 

■ For database applications, the OLTP SQLServer Enterprise Edition 6,5 
database can be on one node and the data warehouse on the other 
node. 

■ For a three tier application, the application/business logic can run on one 
node and the database on the other node. 
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Writing Microsoft Cluster Server (MSCS) Resource DLLs 

Microsoft Corporation 
±997 

Abstract 

Microsoft® Cluster Server (MSCS) allows multiple Microsoft Windows WTO operating system-based 
servers to be connected together, making them appear to network clients as a single, highly available 
system. This paper provides a high-level overview of the processes involved in writing well-behaved 
duster applications for MSCS. This document also describes how application and service developers can 
. take full advantage of MSCS by writing resource dynamic-link libraries (DLLs), debugging their 
applications, and installing their applications and services In a cluster environment. 
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Introduction 

Microsoft Cluster server (MSCS) allows multiple Microsoft Windows NT operating system-based servers to be connected together, 
making them appear to network dlents as a single, highly available system. From the system administrator's viewpoint, MSCS provides 
the additional advantage of easy administration and scalability, and the MSCS architecture provides a standard infrastructure for 
scalable, cluster-aware applications in future versions. 

The purpose of this document is to provide a high-level overview of the processes involved in writing well-behaved applications that can 
take advantage of the Microsoft Ouster Server capabilities. This document describes how you can take full advantage of MSCS by 
writing resource dynamic-link libraries (DLLs), debugging your applications and services, and then installing them in a cluster 
environment. 

Note This White Paper assumes that you have successfully Installed the Microsoft Cluster Server software in a duster 
environment and also have the Microsoft Platform Software Development Kit (SDK) and the bu|ld environment work.no,. If 
you are having trouble setting up the development environment please refer to your Platform SDK documentation. 

Clustering and High Availability 

Microsoft Ouster Server allows applications and services to run more efficiently on windows NT Server by directing client requests 
based on resource availability and server load. (In the first release of MSCS, load balancing Is done manually; future releases will 
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provide automatic load balancing.) if one of the systems-or /rorfes-ln the cluster is unavailable or has failed due to hardware or 
software problems, its workload Is handled by other systems In the duster until the failed systems are brought back online. 

Note that Microsoft Cluster Server Is designed to provide high availability, rather than true fault tolerance. The phrase "fault tolerant" Is 
generally used to describe technology that offers a higher level of resilience and recovery. Fault-tolerant servers typically use a high 
degree of hardware or data redundancy, combined wrth specialized software, to provide near-instantaneous recovery from any single 
hardware or software fault These solutions cost significantly more than a clustering solution because you must pay for redundant 
hardware that waits idly for a fault from which to recover. Microsoft Cluster Server provides a very good high-availability solution using 
standard. Inexpensive hardware, while maximizing computing resources. 

The Shares-Nothing Model 

Microsoft Cluster Server version 1.0 is a two-node cluster that is based on the shared-nothing clustering model. The shared-nothing 
model dictates that while several nodes In the cluster may have access to a device or resource, the resource Is owned and managed by 
only one system at a time, (In an mscs duster, a resource is defined as any physical or logical component that can be brought online 
and taken offline, managed In a cluster, hosted by only one node at a time, and moved between nodes,) 

each node has its own memory, system disk, operating system, and subset of the duster's resources. If a node fails, the other node 
takes ownership of the failed node's resources (this process Is known as /a/tover). Microsoft cluster Server then registers the network 
address for the resource on the new node so that client traffic Is routed to the system that is available and now owns the resource. 
When the failed resource is later brought back online, MSCS can be configured to redistribute resources and client requests 
appropriately (this process is known as fallback). 

Note When a node fails, any dlents are disconnected. For the failover to be truly transparent, client applications must 
be written to reconnect In the event of node failure. 



A genetic MSCS duster setup Is shown In Figure l, below. 




Node A 



NodeB 



Figure l. Standard two-node MSCS configuration 

The following section provides an introduction to MSCS architecture. 
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Microsoft Cluster Server Architecture 

Microsoft Cluster Server is comprised of three key components: 

• The Cluster Service 

• The Resource Monitor 

• Resource and Cluster Administrator extension DLLs 
The Ouster Service 

The Ouster Service (which is composed of the Event Processor, the Faitever Manager/Resource Manager, the Global Update Manager,, 
and so forth) is the core component of MSCS and runs as a high-priority system service. The duster Service controls cluster acbvrties 
and performs such tasks as coordinating event notification, facilitating communication between cluster components, handling failover. 
operations, and managing the configuration. Each cluster node runs Its own Cluster service. 

The Resource Monitor 

The Resource Monitor is an Interface between the Cluster Service and the duster resources, and.runs as an independent process. The 
Cluster service uses the Resource Monitor to communicate with the resource DLLs. The DLL handles all communication with the 
resource, thus shielding the Cluster Service from resources that misbehave or stop functioning. Multiple copies of the Resource Monitor 
can be running on a single node, thereby providing a means by which unpredictable resources can be isolated from other resources. 

The Resource DLL 

The third key Microsoft Ouster server component is the resource DLL. The Resource Monitor and resource DLL communicate using the 
Resource API, which is a collection of entry points, callback functions, and related structures and macros used to manage resources. 
Applications that Implement their own resource DLLs to communicate with the Cluster service and that use the cluster API to request 
and update cluster Information are defined as duster-aware applications. Applications and services that do not use the Cluster or 
Resource APIs and duster control code functions are unaware of clustering and have no knowledge that MSCS Is runnmg. These duster- 
unaware applications are generally managed as generic applications or services. 

Both cluster-aware and cluster-unaware applications run on a cluster node and can be managed as duster resources. However, only 
cluster-aware applications can take advantage of features offered by Cluster Server through the Ouster API. For example, cluster-aware 
applications can: 

• Report status upon request to the Resource Monitor. 

• Respond to requests to be brought onBne or to be taken offline gracefully. 

• Respond more accurately to IsAlive and LooksAlhre requests. 

MSCS indudes two tools that perform basic duster management: these tools are Cluster Administrator (CluAdmin exe) and a command- 
line management tool (Ctuster.exe). You are encouraged to write your own custom management tools If needed; however, any lengthy 
discussion of managing duster-unaware applications or developing duster management tools Is beyond the scope of this paper. 

Figure 2 shows how the Cluster Service, Resource Monitor, and resource DLLs interact with each other on a single node running the 
Windows NT Server, Enterprise Edition operating system, cluster management applications, and both duster-aware and duster-unaware 

applications. 
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Figure 2. MSCS components on a single node running Windows NT Server 

Note that cluster-aware applications should also implement duster Administrator extension DLLs, which contain Implementations of 
interfaces from the Cluster Administrator extension API. A Ouster Administrator extension DLL allows an application to be 
into the Cluster Administrator tool (CluAdmln.exe). implementing custom resource and Cluster Administrator extens.on DLLs allows for 
specialized management of the application and its related resources, and enables the system administrator to install and configure the 
application more easily. 

Next, this paper describes resources and resource DLLS in greater detail, and describes the reasons for writing a custom resource DLL. 
Resources and Resource DLLs 

To the Cluster Service, a resource is any physical or logical component that can be managed. Examples of resources are disks, network 
names, IP addresses, databases, Web sites, application programs, and any other entity that can be brought online and teken offline. 
Source* are organized by type. Resource types Include physical hardware (such as disk drives) and logical terns (such as IP 
addresses, file shares, and generic applications). 

Every resource uses a resource DLL, a largely passive translation layer between the Resource Monitor and the resource. The Resource 
MorLr calls the entry point functions of the resource DLL to check ttie status of the resource and to bring the resource onl ne and 
Snt T^Vre^r^ DLL is responsible for communicating with Its resource through any convenient IPC mechamsm to Implement these 
methods. 

Note that applications or services that do not provide their own resource DLLs can still be configured into the cluster environment- 
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